In this homework assignment, we will explore, analyze and model a
data set containing approximately 8000 records representing a customer
at an auto insurance company. We will build multiple linear regression
models on the continuous variable TARGET_AMT and binary
logistic regression model on the boolean variable
TARGET_FLAG to predict the probability that a person will
crash their car, and to predict the associated costs.
We are going to build several models using different techniques and variable selection. In order to best assess our predictive models, we will create a validation set within our training data along an 80/20 training/testing proportion, before applying the finalized models to a separate evaluation dataset that does not contain the target.
The insurance training dataset contains 8161 observations of 26 variables, each record represents a customer at an auto insurance company. The evaluation dataset contains 2141 observations of 26 variables. These include demographic measures such as age and gender, socioeconomic measures such as education and household income, and vehicle-specific metrics such as car model, age and assessed value.
Each record also has two response variables. The first response
variable, TARGET_FLAG, is a boolean where “1” means that
the person was in a car crash. The second response variable,
TARGET_AMT is a numeric indicating the (positive) cost if a
car crash occurred; this value is zero if the person did not crash their
car.
We can explore a sample of the training data here, and make some initial observations:
z_ that could
be removed for readability.The table below provides valuable descriptive statistics about the training data:
| Name | train_df |
| Number of rows | 8161 |
| Number of columns | 25 |
| _______________________ | |
| Column type frequency: | |
| character | 10 |
| numeric | 15 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| PARENT1 | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| MSTATUS | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| SEX | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
| EDUCATION | 0 | 1 | 3 | 12 | 0 | 5 | 0 |
| JOB | 0 | 1 | 6 | 12 | 0 | 9 | 0 |
| CAR_USE | 0 | 1 | 7 | 10 | 0 | 2 | 0 |
| CAR_TYPE | 0 | 1 | 3 | 11 | 0 | 6 | 0 |
| RED_CAR | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| REVOKED | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
| URBANICITY | 0 | 1 | 19 | 19 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| TARGET_FLAG | 0 | 1.00 | 0.26 | 0.44 | 0 | 0 | 0 | 1 | 1.0 |
| TARGET_AMT | 0 | 1.00 | 1504.32 | 4704.03 | 0 | 0 | 0 | 1036 | 107586.1 |
| KIDSDRIV | 0 | 1.00 | 0.17 | 0.51 | 0 | 0 | 0 | 0 | 4.0 |
| AGE | 6 | 1.00 | 44.79 | 8.63 | 16 | 39 | 45 | 51 | 81.0 |
| HOMEKIDS | 0 | 1.00 | 0.72 | 1.12 | 0 | 0 | 0 | 1 | 5.0 |
| YOJ | 454 | 0.94 | 10.50 | 4.09 | 0 | 9 | 11 | 13 | 23.0 |
| INCOME | 445 | 0.95 | 61898.09 | 47572.68 | 0 | 28097 | 54028 | 85986 | 367030.0 |
| HOME_VAL | 464 | 0.94 | 154867.29 | 129123.77 | 0 | 0 | 161160 | 238724 | 885282.0 |
| TRAVTIME | 0 | 1.00 | 33.49 | 15.91 | 5 | 22 | 33 | 44 | 142.0 |
| BLUEBOOK | 0 | 1.00 | 15709.90 | 8419.73 | 1500 | 9280 | 14440 | 20850 | 69740.0 |
| TIF | 0 | 1.00 | 5.35 | 4.15 | 1 | 1 | 4 | 7 | 25.0 |
| OLDCLAIM | 0 | 1.00 | 4037.08 | 8777.14 | 0 | 0 | 0 | 4636 | 57037.0 |
| CLM_FREQ | 0 | 1.00 | 0.80 | 1.16 | 0 | 0 | 0 | 2 | 5.0 |
| MVR_PTS | 0 | 1.00 | 1.70 | 2.15 | 0 | 0 | 1 | 3 | 13.0 |
| CAR_AGE | 510 | 0.94 | 8.33 | 5.70 | -3 | 1 | 8 | 12 | 28.0 |
/
Based on this summary table and exploration of the data, we can make the following observations:
YOJ (6%),
INCOME (5%), HOME_VAL (6%),
CAR_AGE (6%), and AGE (1%).CAR_AGE has a negative value of -3, which
doesn’t make intuitive sense.Before building a model, we need to make sure that we have both
classes equally represented in our TARGET_FLAG variable.
Class 1 takes 27% and class 0 takes 63% of the
target variable. As a result, we have unbalanced class distribution for
our target variable that we have to deal with, we have to take some
additional steps (bootstrapping, etc) before using logistic
regression.
| Value | % |
|---|---|
| 0 | 0.74 |
| 1 | 0.26 |
Many of these distributions seem highly skewed and non-normal. As part of our data preparation we’ll use power transformations to find whether transforming variables to more normal distributions improves our models’ efficacy.
Commentary
Interestingly, none of our predictors appear to have strong linear
relationships to our TARGET_AMT response variable, which is
a primary assumption of linear regression. This suggests that
alternative methods might be more successful in modeling the
relationships.
Commentary
In order to work with our training dataset, we’ll need to first convert some variables to more useful data types:
INCOME,HOME_VAL,BLUEBOOK and
OLDCLAIM.TARGET_FLAG,
CAR_TYPE, CAR_USE, EDUCATION,
JOB, MSTATUS, PARENT1,
RED_CAR, REVOKED, SEX and
URBANICITY.Before we go further, we need to identify and handle any missing, NA or negative data values so we can perform log transformations and regression.
First, we’ll apply transformations to clean up and align formatting of our variables:
INDEX variable.JOB blank values with ‘Unknown’.Next, we’ll manually adjust two special cases of missing or outlier values.
YOJ is zero and INCOME is
NA, we’ll set INCOME to zero to avoid imputing new values
over legitimate instances of non-employment.CAR_AGE that is
less than zero - we’ll assume this is a data collection error and set it
to zero (representing a brand-new car.)We’ll use MICE to impute our remaining variables with missing values
- AGE, YOJ, CAR_AGE,
INCOME and HOME_VALUE:
Next we’ll want to consider any power transformations for variables
that have skewed distributions. For example, our numeric response
variable TARGET_AMT is a good candidate for transformation
as its distribution is very highly skewed, and the assumption of
normality is required in order to apply linear regression.
INCOME,
TARGET_AMT, OLDCLAIM to transform their
distributions from right-skewed to normally distributed.BLUEBOOK, TRAVTIME, TIF, so they
also are more normally distributed.To give our models more variables to work with, we’ll engineer some additional features:
CAR_AGE, HOME_VAL
and TIF.MALE,
MARRIED, LIC_REVOKED, CAR_RED,
PRIVATE_USE, SINGLE_PARENT and
URBAN.We can examine our final, transformed training dataset and
distributions below (with a temporary numeric variable
CAR_CRASH to represent)
| Name | train_df |
| Number of rows | 8161 |
| Number of columns | 28 |
| _______________________ | |
| Column type frequency: | |
| factor | 11 |
| numeric | 17 |
| ________________________ | |
| Group variables | None |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| TARGET_FLAG | 0 | 1 | FALSE | 2 | 0: 6008, 1: 2153 |
| EDUCATION | 0 | 1 | FALSE | 5 | Hig: 2330, Bac: 2242, Mas: 1658, <Hi: 1203 |
| JOB | 0 | 1 | FALSE | 9 | Blu: 1825, Cle: 1271, Pro: 1117, Man: 988 |
| CAR_TYPE | 0 | 1 | FALSE | 6 | SUV: 2294, Min: 2145, Pic: 1389, Spo: 907 |
| MALE | 0 | 1 | FALSE | 2 | 0: 4375, 1: 3786 |
| MARRIED | 0 | 1 | FALSE | 2 | 1: 4894, 0: 3267 |
| LIC_REVOKED | 0 | 1 | FALSE | 2 | 0: 7161, 1: 1000 |
| CAR_RED | 0 | 1 | FALSE | 2 | 0: 5783, 1: 2378 |
| PRIVATE_USE | 0 | 1 | FALSE | 2 | 1: 5132, 0: 3029 |
| SINGLE_PARENT | 0 | 1 | FALSE | 2 | 0: 7084, 1: 1077 |
| URBAN | 0 | 1 | FALSE | 2 | 1: 6492, 0: 1669 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| TARGET_AMT | 0 | 1 | -1.21 | 5.69 | -4.61 | -4.61 | -4.61 | 6.94 | 11.59 |
| KIDSDRIV | 0 | 1 | 0.17 | 0.51 | 0.00 | 0.00 | 0.00 | 0.00 | 4.00 |
| AGE | 0 | 1 | 44.78 | 8.63 | 16.00 | 39.00 | 45.00 | 51.00 | 81.00 |
| HOMEKIDS | 0 | 1 | 0.72 | 1.12 | 0.00 | 0.00 | 0.00 | 1.00 | 5.00 |
| YOJ | 0 | 1 | 10.48 | 4.10 | 0.00 | 9.00 | 11.00 | 13.00 | 23.00 |
| INCOME | 0 | 1 | 61461.41 | 47434.08 | 0.00 | 27684.00 | 53483.00 | 85479.00 | 367030.00 |
| HOME_VAL | 0 | 1 | 154644.85 | 129340.41 | 0.00 | 0.00 | 160874.00 | 238349.00 | 885282.00 |
| TRAVTIME | 0 | 1 | 15.08 | 5.82 | 3.00 | 11.17 | 15.34 | 19.12 | 45.61 |
| BLUEBOOK | 0 | 1 | 182.32 | 48.72 | 62.21 | 147.95 | 182.18 | 216.49 | 381.01 |
| TIF | 0 | 1 | 1.63 | 1.25 | 0.00 | 0.00 | 1.62 | 2.43 | 4.70 |
| OLDCLAIM | 0 | 1 | 0.56 | 6.54 | -4.61 | -4.61 | -4.61 | 8.44 | 10.95 |
| CLM_FREQ | 0 | 1 | 0.80 | 1.16 | 0.00 | 0.00 | 0.00 | 2.00 | 5.00 |
| MVR_PTS | 0 | 1 | 1.70 | 2.15 | 0.00 | 0.00 | 1.00 | 3.00 | 13.00 |
| CAR_AGE | 0 | 1 | 8.34 | 5.70 | 0.00 | 1.00 | 8.00 | 12.00 | 28.00 |
| CAR_AGE_BIN | 0 | 1 | 2.48 | 1.12 | 1.00 | 1.00 | 2.00 | 3.00 | 4.00 |
| HOME_VAL_BIN | 0 | 1 | 2.45 | 1.16 | 1.00 | 1.00 | 2.00 | 3.00 | 4.00 |
| TIF_BIN | 0 | 1 | 2.41 | 1.16 | 1.00 | 1.00 | 2.00 | 3.00 | 4.00 |
We can use Mosaic Plots to illustrate the relationship of binary
factor variables to TARGET_FLAG:
We can also use Mosaic Plots to illustrate the relationship of
multi-level factor variables to TARGET_FLAG:
To proceed with modeling, we’ll split our training data into train (80%) and validation (20%) datasets.
## 41 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -1.450670e+00
## KIDSDRIV 8.242256e-01
## AGE -1.590170e-03
## HOMEKIDS 2.966269e-02
## YOJ -1.843188e-02
## INCOME -6.032096e-06
## HOME_VAL .
## EDUCATIONBachelors -6.083644e-01
## EDUCATIONHigh School 6.004131e-02
## EDUCATIONMasters -4.377078e-01
## EDUCATIONPhD -1.234115e-01
## JOBClerical 1.422133e-01
## JOBDoctor -9.728426e-01
## JOBHome Maker .
## JOBLawyer -1.137389e-01
## JOBManager -1.529192e+00
## JOBProfessional -1.013136e-01
## JOBStudent -1.994015e-01
## JOBUnknown -6.904978e-01
## TRAVTIME 6.940606e-02
## BLUEBOOK -7.084013e-03
## TIF -1.700780e-01
## CAR_TYPEPanel Truck 6.532344e-01
## CAR_TYPEPickup 7.223041e-01
## CAR_TYPESports Car 1.591797e+00
## CAR_TYPESUV 1.135741e+00
## CAR_TYPEVan 7.850491e-01
## OLDCLAIM 4.862112e-02
## CLM_FREQ 7.523577e-02
## MVR_PTS 2.709291e-01
## CAR_AGE -7.336134e-03
## CAR_AGE_BIN .
## HOME_VAL_BIN -3.339554e-01
## TIF_BIN -1.534265e-01
## MALE1 4.100768e-02
## MARRIED1 -8.001639e-01
## LIC_REVOKED1 1.591403e+00
## CAR_RED1 3.091972e-02
## PRIVATE_USE1 -1.677192e+00
## SINGLE_PARENT1 1.042326e+00
## URBAN1 3.650730e+00
## 41 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -1.123592e+00
## KIDSDRIV 5.781403e-01
## AGE .
## HOMEKIDS .
## YOJ .
## INCOME -6.145362e-06
## HOME_VAL .
## EDUCATIONBachelors .
## EDUCATIONHigh School 2.565739e-01
## EDUCATIONMasters .
## EDUCATIONPhD .
## JOBClerical .
## JOBDoctor .
## JOBHome Maker .
## JOBLawyer .
## JOBManager -1.060109e+00
## JOBProfessional .
## JOBStudent .
## JOBUnknown .
## TRAVTIME 4.069176e-02
## BLUEBOOK -5.715392e-03
## TIF -1.354175e-01
## CAR_TYPEPanel Truck .
## CAR_TYPEPickup .
## CAR_TYPESports Car 5.614647e-01
## CAR_TYPESUV 2.868858e-01
## CAR_TYPEVan .
## OLDCLAIM 6.328528e-02
## CLM_FREQ 1.384936e-03
## MVR_PTS 2.502005e-01
## CAR_AGE -1.463889e-02
## CAR_AGE_BIN .
## HOME_VAL_BIN -3.804650e-01
## TIF_BIN -6.383520e-02
## MALE1 .
## MARRIED1 -4.457439e-01
## LIC_REVOKED1 1.321738e+00
## CAR_RED1 .
## PRIVATE_USE1 -1.384149e+00
## SINGLE_PARENT1 1.010721e+00
## URBAN1 2.952228e+00
##
## Call: cv.glmnet(x = X, y = Y, nfolds = 5, family = "gaussian", standardize = TRUE, alpha = 1)
##
## Measure: Mean-Squared Error
##
## Lambda Index Measure SE Nonzero
## min 0.01331 51 25.33 0.4037 37
## 1se 0.16407 24 25.73 0.4060 20
## $mse
## lambda.1se
## 25.50668
## attr(,"measure")
## [1] "Mean-Squared Error"
##
## $mae
## lambda.1se
## 4.166593
## attr(,"measure")
## [1] "Mean Absolute Error"
## $AICc
## [1] -48241.06
##
## $BIC
## [1] -47990.49
## $AICc
## [1] -44758.86
##
## $BIC
## [1] -44623.31
## 41 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -1.450670e+00
## KIDSDRIV 8.242256e-01
## AGE -1.590170e-03
## HOMEKIDS 2.966269e-02
## YOJ -1.843188e-02
## INCOME -6.032096e-06
## HOME_VAL .
## EDUCATIONBachelors -6.083644e-01
## EDUCATIONHigh School 6.004131e-02
## EDUCATIONMasters -4.377078e-01
## EDUCATIONPhD -1.234115e-01
## JOBClerical 1.422133e-01
## JOBDoctor -9.728426e-01
## JOBHome Maker .
## JOBLawyer -1.137389e-01
## JOBManager -1.529192e+00
## JOBProfessional -1.013136e-01
## JOBStudent -1.994015e-01
## JOBUnknown -6.904978e-01
## TRAVTIME 6.940606e-02
## BLUEBOOK -7.084013e-03
## TIF -1.700780e-01
## CAR_TYPEPanel Truck 6.532344e-01
## CAR_TYPEPickup 7.223041e-01
## CAR_TYPESports Car 1.591797e+00
## CAR_TYPESUV 1.135741e+00
## CAR_TYPEVan 7.850491e-01
## OLDCLAIM 4.862112e-02
## CLM_FREQ 7.523577e-02
## MVR_PTS 2.709291e-01
## CAR_AGE -7.336134e-03
## CAR_AGE_BIN .
## HOME_VAL_BIN -3.339554e-01
## TIF_BIN -1.534265e-01
## MALE1 4.100768e-02
## MARRIED1 -8.001639e-01
## LIC_REVOKED1 1.591403e+00
## CAR_RED1 3.091972e-02
## PRIVATE_USE1 -1.677192e+00
## SINGLE_PARENT1 1.042326e+00
## URBAN1 3.650730e+00
## # A tibble: 5 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 mape standard 70.5
## 2 smape standard 110.
## 3 mase standard 0.776
## 4 mpe standard 66.1
## 5 rmse standard 5.03
## 41 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -1.716098e+00
## KIDSDRIV 4.114268e-01
## AGE -1.274043e-03
## HOMEKIDS 2.779506e-02
## YOJ -5.912864e-03
## INCOME -3.964779e-06
## HOME_VAL .
## EDUCATIONBachelors -3.129844e-01
## EDUCATIONHigh School 1.563180e-02
## EDUCATIONMasters -2.467550e-01
## EDUCATIONPhD -4.260311e-02
## JOBClerical 6.781448e-02
## JOBDoctor -4.680465e-01
## JOBHome Maker -2.933138e-02
## JOBLawyer .
## JOBManager -7.506532e-01
## JOBProfessional -3.377667e-02
## JOBStudent -1.395579e-01
## JOBUnknown -2.049771e-01
## TRAVTIME 3.869096e-02
## BLUEBOOK -4.230398e-03
## TIF -5.004348e-02
## CAR_TYPEPanel Truck 4.776375e-01
## CAR_TYPEPickup 4.281607e-01
## CAR_TYPESports Car 8.755467e-01
## CAR_TYPESUV 6.355374e-01
## CAR_TYPEVan 5.122177e-01
## OLDCLAIM 2.504929e-02
## CLM_FREQ 3.513050e-02
## MVR_PTS 1.068372e-01
## CAR_AGE -2.727552e-03
## CAR_AGE_BIN .
## HOME_VAL_BIN -1.660078e-01
## TIF_BIN -1.261495e-01
## MALE1 5.464939e-03
## MARRIED1 -4.532368e-01
## LIC_REVOKED1 6.960323e-01
## CAR_RED1 3.006562e-02
## PRIVATE_USE1 -8.083996e-01
## SINGLE_PARENT1 3.882197e-01
## URBAN1 2.285270e+00
## 41 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -1.464448e+00
## KIDSDRIV 3.295352e-01
## AGE -1.769623e-04
## HOMEKIDS 1.048500e-02
## YOJ .
## INCOME -3.680548e-06
## HOME_VAL .
## EDUCATIONBachelors -1.070104e-01
## EDUCATIONHigh School 7.383992e-02
## EDUCATIONMasters -3.456691e-02
## EDUCATIONPhD .
## JOBClerical 3.677502e-02
## JOBDoctor -7.665609e-02
## JOBHome Maker .
## JOBLawyer .
## JOBManager -5.667630e-01
## JOBProfessional .
## JOBStudent .
## JOBUnknown .
## TRAVTIME 2.797189e-02
## BLUEBOOK -3.228510e-03
## TIF -5.098952e-02
## CAR_TYPEPanel Truck .
## CAR_TYPEPickup 6.408488e-02
## CAR_TYPESports Car 4.575649e-01
## CAR_TYPESUV 2.774664e-01
## CAR_TYPEVan 3.657761e-02
## OLDCLAIM 2.599506e-02
## CLM_FREQ 1.974628e-02
## MVR_PTS 9.972623e-02
## CAR_AGE -7.803403e-03
## CAR_AGE_BIN .
## HOME_VAL_BIN -1.642793e-01
## TIF_BIN -8.078956e-02
## MALE1 .
## MARRIED1 -3.288395e-01
## LIC_REVOKED1 6.046503e-01
## CAR_RED1 .
## PRIVATE_USE1 -7.491766e-01
## SINGLE_PARENT1 3.783721e-01
## URBAN1 1.913118e+00
##
## Call: cv.glmnet(x = X, y = Y, nfolds = 5, family = "binomial", link = "logit", standardize = TRUE, alpha = 1)
##
## Measure: Binomial Deviance
##
## Lambda Index Measure SE Nonzero
## min 0.001033 51 0.9039 0.010248 37
## 1se 0.007285 30 0.9132 0.009009 28
## $deviance
## lambda.min
## 0.8930299
## attr(,"measure")
## [1] "Binomial Deviance"
##
## $class
## lambda.min
## 0.2074142
## attr(,"measure")
## [1] "Misclassification Error"
##
## $auc
## [1] 0.8143323
## attr(,"measure")
## [1] "AUC"
##
## $mse
## lambda.min
## 0.2897296
## attr(,"measure")
## [1] "Mean-Squared Error"
##
## $mae
## lambda.min
## 0.5843315
## attr(,"measure")
## [1] "Mean Absolute Error"
## $AICc
## [1] -1628.926
##
## $BIC
## [1] -1378.357
## $AICc
## [1] -1565.658
##
## $BIC
## [1] -1375.96
## 41 x 1 sparse Matrix of class "dgCMatrix"
## s1
## (Intercept) -1.716098e+00
## KIDSDRIV 4.114268e-01
## AGE -1.274043e-03
## HOMEKIDS 2.779506e-02
## YOJ -5.912864e-03
## INCOME -3.964779e-06
## HOME_VAL .
## EDUCATIONBachelors -3.129844e-01
## EDUCATIONHigh School 1.563180e-02
## EDUCATIONMasters -2.467550e-01
## EDUCATIONPhD -4.260311e-02
## JOBClerical 6.781448e-02
## JOBDoctor -4.680465e-01
## JOBHome Maker -2.933138e-02
## JOBLawyer .
## JOBManager -7.506532e-01
## JOBProfessional -3.377667e-02
## JOBStudent -1.395579e-01
## JOBUnknown -2.049771e-01
## TRAVTIME 3.869096e-02
## BLUEBOOK -4.230398e-03
## TIF -5.004348e-02
## CAR_TYPEPanel Truck 4.776375e-01
## CAR_TYPEPickup 4.281607e-01
## CAR_TYPESports Car 8.755467e-01
## CAR_TYPESUV 6.355374e-01
## CAR_TYPEVan 5.122177e-01
## OLDCLAIM 2.504929e-02
## CLM_FREQ 3.513050e-02
## MVR_PTS 1.068372e-01
## CAR_AGE -2.727552e-03
## CAR_AGE_BIN .
## HOME_VAL_BIN -1.660078e-01
## TIF_BIN -1.261495e-01
## MALE1 5.464939e-03
## MARRIED1 -4.532368e-01
## LIC_REVOKED1 6.960323e-01
## CAR_RED1 3.006562e-02
## PRIVATE_USE1 -8.083996e-01
## SINGLE_PARENT1 3.882197e-01
## URBAN1 2.285270e+00
## True
## Predicted 0 1 Total
## 0 1102 244 1346
## 1 100 187 287
## Total 1202 431 1633
##
## Percent Correct: 0.7893
The lasso regression solve multicollinearity issue by selecting the variable with the largest coefficient while setting the rest to (nearly) zero.
** Model 3.1: Lasso Linear Regression **
** Model 4.1: Lasso Logistic Regression **
## F1
## 0.5208914
Regression
| Model | mape | smape | mase | mpe | rmse | AIC |
|---|---|---|---|---|---|---|
| M4.1:Lasso Linear | 70.4935 | 110.106 | 0.7764 | 66.0545 | 5.0267 | -48241.06 |
Logistic
| Model | Accuracy | Classification error rate | F1 | Deviance | R2 | Sensitivity | Specificity | Precision | AIC |
|---|---|---|---|---|---|---|---|---|---|
| M4.1:Lasso Logistic | 0.8091 | 0.1909 | 0.5209 | 0.9021 | NA | 0.4339 | 0.9168 | 0.6516 | -1628.926 |